Scalable computation of data

ABSTRACT

Techniques for producing a cross tabulation are described. The techniques involve issuing a plurality of queries to a database. The queries are for each of at least one sublevel of data for each of at least one dimension of data associated with records in the database. The queries provide sublists of sorted identifiers for each one of the queries. The technique determines occurrences of intersections of levels of one dimension with levels of another dimension of the data associated with records in the database by traversing the sublists to detect intersections of the dimensions.

BACKGROUND

This invention relates to data mining and in particular to evaluatingdata from a database producing results similar to those of using on-lineanalytical processing (OLAP) but in a far more computationally efficientmanner.

In problems such as in extracting market data from a database, data isoften organized in dimensions that are in a hierarchy. For example,records are often assigned ID's and the records will have data forvarious attributes that a user may wish to track. An example of adimension hierarchy might be age. The hierarchy of age can have levelsas young, middle, and old. Within each of these levels of young, middleand old can be various numeral age groupings or sublevels such as youngbeing 18-25 or 25-30; middle being 30-40 and 40-55; and old being 55-65and 65 and over, and so forth. A second hierarchy might be income, withincome having different levels and sublevels. Competing approaches toevaluate cross tabulations of age and income in this example usetechniques where the number of computations is related to the number ofdimensions and number of levels or sublevels of the data. For verycomplex or large number of dimensions, the computations increase at anexponential rate.

SUMMARY

According to an aspect of the present invention, a method of producing across tabulation, includes issuing a plurality of queries to a database,the queries being for multiple sublevels of data for multiple dimensionsof data associated with records in the database to provide a sublist ofsorted record identifiers for each one of the queries and determiningoccurrences of intersections of levels of one dimension with levels ofanother dimension of the data associated with records in the database bytraversing the sub-lists to detect intersections of the dimensions.

According to a further aspect of the present invention, a computerprogram product resides on a computer readable medium. The computerprogram product is for producing a cross tabulation structure. Thecomputer program includes instructions for causing a computer to issue aplurality of queries to a database, the queries being for multiplesublevels of data for multiple dimensions of data associated withrecords in the database to provide a sublist of sorted recordidentifiers for each one of the queries, determine occurrences ofintersections of levels of one dimension with levels of anotherdimension of the data associated with records in the database bytraversing the sub-lists to detect intersections of the dimensions, andindicate in a cross-tabulation structure each time an intersection ofone dimension with levels of another dimension of the data is found.

According to a further aspect of the present invention, an apparatusincludes a processor, a memory coupled to the processor, and a computerstorage medium. The computer storage medium stores a computer programproduct for producing a cross tabulation structure. The computer programincludes instructions which when executed in memory by the processor,causing the apparatus to issue a plurality of queries to a database, thequeries being for multiple sublevels of data for multiple dimensions ofdata associated with records in the database to provide a sublist ofsorted record identifiers for each one of the queries, determineoccurrences of intersections of levels of one dimension with levels ofanother dimension of the data associated with records in the database bytraversing the sub-lists to detect intersections of the dimensions andindicate in a cross-tabulation structure each time an intersection ofone dimension with levels of another dimension of the data is found.

One or more aspects of the invention may provide one or more of thefollowing advantages.

The process allows the user to specify the dimensions in a querystatement, thus allowing the user to specify 2 dimensions, 3 dimensions,and so forth. The process executes sets of queries for each specifieddimension only once, while construction of each structure isaccomplished by matching/merging sorted ID lists. The process performspre-aggregation of data for fast display/drill-down by computing astructure quickly after some initial sorting operations. The process canwork over multiple dimensions of data, where it is needed to aggregatedata over multiple dimensions for analysis while avoiding an exponentialgrowth situation. The algorithm performs a very efficient 1-pass throughthe data.

The process provides a number of performance improvements over competingprocesses. For instance, the speed of calculations is based on the sumof the number of levels over all dimensions or the sum of the mostgranular number of levels for each dimension if the hierarchy can berolled up from lower levels.

For a single-dimension query, the computation is of the order (n log n),where n is the number of rows of data being processed, assuming that thedata is not sorted. For calculating multiple dimensions, the calculationis of the order of (n log n) times m, where m is the number of levelsacross all dimensions (or the number at the most granular levels acrossall dimensions if the hierarchy can be rolled up from lower levels). Ifthe data is returned from a database with the fields already sorted, thecalculation complexity is of the order (n×m). This approach can be 10 to100 times faster than competing approaches which have a calculations onthe order of n*the number of complex queries=f (number of dimensions andlevels).

The details of one or more embodiments of the invention are set forth inthe accompanying drawings and the description below. Other features,objects, and advantages of the invention will be apparent from thedescription and drawings, and from the claims.

DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram of a computer system accessing a database.

FIG. 2 is a block diagram an exemplary record in the database.

FIG. 3 is a flow chart of a cross-tabulation technique.

FIG. 4 is a block diagram of a table.

FIG. 5 is diagram depicting sorted lists and cursors.

FIG. 6 is a diagram depicting a two-dimension cross-tabulationstructure.

FIG. 7 is a flow chart of a merging technique.

FIG. 8 is a diagram depicting a three-dimensional cross-tabulationstructure.

DETAILED DESCRIPTION

Referring now to FIG. 1, a computer system 10 includes a CPU 12, mainmemory 14 and persistent storage device 16 all coupled via a computerbus 18. The system 10 also includes output devices such as a display 20and a printer 22, as well as user-input devices such as a keyboard 24and a mouse 26. Not shown in FIG. 1 but necessarily included in a systemof FIG. 1 are software drivers and hardware interfaces to couple all theaforementioned elements to the CPU 12.

The computer system 10 also includes marketing automation/CampaignManagement software 30 that resides in storage 16 and which operates inconjunction with a database 32. The marketing automation/CampaignManagement software 30 supports various types of campaign programs. Themarketing automation/Campaign Management software 30 allows a user toquickly form cross-tabulations of records in the database using across-tabulation process 40. The marketing automation/CampaignManagement software 30 is shown residing in storage 16 but could residein storage in server 28 as part of a client-server arrangement, or canbe configured in other manners.

Referring now to FIG. 2, a data set includes a plurality of records withrecord 33 being illustrative. The record 33 can include an identifierfield 34 a, as well as one or a plurality of fields 34 b correspondingto values that may be used in the marketing automation/CampaignManagement software 30. The record 33 also includes a plurality ofresult fields 35 that are used by a modeling process (part of themarketing automation/Campaign Management software 30 or independentsoftware on either the computer system 10 or the server 28) to recordscores for the record 33. The record 33 can also include key fields (notshown) that are used to join and navigate between database tables (notshown). Typically, for each of the records, one (or more) of the fieldswould be a primary key for that record in the record's primary table andthe others would be secondary keys for tables that it might be joined toaccording to some characteristic or search request.

Referring to FIG. 3, cross-tabulation process 40 produces an n×n (e.g.,a 4×4 cube or cross-tabulation structure) (FIG. 6) of counts ofoccurrences of records 33 in the database 32 that have intersectinglevels of data for different dimensions of data (fields) in the records33.

However, the algorithm does not require the number of sublevels in eachdimension to be equal (i.e., it works equally to generate an n×mstructure). In an illustrated embodiment, database 32 stores records 33of potential contacts for the marketing automation/Campaign Managementsoftware 30. The records 33 have fields that specify an audience (e.g.,customer ID), and for each audience ID, other attributes (e.g., age andincome) of the customer. Other examples of different audiences (e.g.,household, account, customer, business), types of data, or differenttypes of records can be used.

The process 40 initializes 41 a indices m and n to m=1 and n=1 andissues 41 b a master query to retrieve a list of unique record ID's,e.g., Customer ID's. The process 40 issues 42, the queries of the form,Select <Audience ID(s)> from <DB table> where <query condition> order by<Audience ID(s)>″ to the database 32 to retrieve lists of Customer IDsthat satisfy each of the queries. The <query condition> in each query isbased on the boundary conditions for each of the levels or sublevels ofa dimension. The details in the flow chart of issuing the query isillustrative only to convey the sense that in one approach multiplequeries are issued for the first dimension and thereafter multiplequeries are issued for the second dimension and so forth. Otherarrangements can be used of course.

In the example to be described, a count of customers with certain agesand incomes is desired. The query set can be organized to search thedatabase to retrieve Customer ID's over sublevels of ages and incomes,e.g., with age and income in this example each having four sublevels.The queries in this case might be:

-   -   Select Cust_ID from TableX where Age<25 order by Cust_ID    -   Select Cust_ID from TableX where Age>=25 and Age<35 order by        Cust_ID    -   Select Cust_ID from TableX where Age>=35 and Age<50 order by        Cust_ID    -   Select Cust_ID from TableX where Age>=50 order by Cust_ID    -   Select Cust_ID from TableX where Age<18 order by Cust_ID    -   Select Cust_ID from TableY where Income<25000 order by Cust_ID    -   Select Cust_ID from TableY where Income>=25000 and Income<50000        order by Cust_ID    -   Select Cust_ID from TableY where Income>=50000 and Income<75000        order by Cust_ID    -   Select Cust_ID from TableY where Income>=75000 order by Cust_ID

These queries return record identifiers, e.g., Customer ID's in a formof a list that are sorted 44 by Customer ID 14 into a like plurality ofsub-lists. In general, sorting is part of the process performed by thedatabase returning results from the queries. Alternatively, thesub-lists that returned can be sorted using any efficient sortingtechnique. The process merges 46 the returned lists according tointersections between age and income (dimensions of data in thesub-lists) by scanning the sub-lists to produce count information thatis used to populate a cross-tabulation structure (FIG. 6) to indicatehow many records exist in each combination of age and income sublevels.While the database could contain a very large number, e.g., a billion ormore rows or records, by applying the process 40 the results areobtained quickly.

In the illustrative embodiment of building a 4×4 cross-tabulationstructure, the process 40 issues 42 four queries to produce sub-lists ofcustomer ID's that are in age bracket 1, customer ID's that are in agebracket 2, customer ID's that are in age bracket 3, and customer ID'sthat are in age bracket 4. The process also issues 42 four additionalqueries to produce sub-lists of customer ID's that are in income bracketa, customer ID's that are in income bracket b, customer ID's that are inincome bracket c, and customer ID's that are in income bracket d. Thetotal number of queries in this example is 8, which is one query foreach age bracket and one for each income bracket. There is no need forthe number of brackets for each dimension to be the same as they are inthis example. Ideally, each list of customer ID's are already sorted bythe database. Based on those 8 queries, the process sorts 44 ifnecessary and finds 46 the cross tabulation between qualifying age andincome and populates the 4×4 cross-tabulation structure (FIG. 6) asfurther explained below.

Referring to FIG. 4, the database 32 (exemplary depiction) has recordscorresponding to 8 customers with customer ID's A-H and these customershave ages and incomes that fall within groups 1-4 and a-d asillustrated. Thus, customer A has an age in sublevel 1 and an income insublevel b, customer B has an age in sublevel 2 and an income insublevel c and so forth.

Referring now to FIG. 5, the process 40 produces a master list 62 of allthe customers here A-H and issues a query to return a sub-list 64 a ofall Customer ID's that have ages that fall within sublevel 1 (which areCustomer ID's A, C, and F). The process issues a second query to returna sub-list 64 b of all Customer ID's that have ages that fall withinsublevel 2, (which are Customer ID's B and E), a third query to return asub-list 64 c of all Customer ID's that have ages that fall withinsublevel 3, (which are Customer ID's D and G), and fourth query toreturn a sub-list 64 d of all Customer ID's that have age that fallwithin sublevel 4 (which is Customer ID H).

A second set of queries is issued for income, the second dimension ofthe structure. The second set has a fifth query to return a sub-list 66a of customer ID's for Income for “sublevel a” which are Customer ID's Cand E. A sixth query is issued to return a sub-list 66 b of customerID's for income for “sublevel “b, which are Customer ID's A and D, aseventh query returns a sub-list 66 c of customer ID's for income for“sublevel c”, which are Customer ID's B, and F and an eighth query isissued to return a sub-list 66 d of customer ID's for income for“sublevel d”, which are Customer ID's G and H.

Thus, between the two sets of queries (one set for age and one set forincome), 8 queries are issued since each dimension of age and income has4 sublevels. The number of queries issued is the sum of the number ofsublevels, not the product. The sorted lists 62, 64 a-64 d and 66 a-66 dare indexed by cursors or pointers 63, 67 a-67 d and 69 a-69 drespectively.

Referring now to FIG. 6, the process 40 merges 46 those lists by lookingfor intersections and thus generates a two dimensional array 80 havingas dimensions the sublevels 1, 2, 3, 4, for the dimension “age” and thesublevels a, b, c and d for the dimension “income.” The process 40produces an n×n structure (e.g., 4×4) where n is the number of sub-listsfor each dimension. Thus, each cell of the structure 80 is anintersection corresponding to the sublevel of each dimension, age andincome. The cell is populated with a value that represents the number oftimes that there was an intersection (common Customer ID) between asublevel of age and a sublevel of income.

Referring to FIG. 5 and FIG. 7, scanning or merging 46 of the sub-listsis accomplished by initializing 46 a the cursors 63, 67 a-67 d and 69a-69 d at the top of each of the sub-lists 62, 64 a-64 d and 66 a-66 drespectively to the value one (FIG. 5). The merging process 46 alsoinitializes indices of the lists 64 a-64 d and 66 a-66 d to the valueone, which in FIG. 7 are represented as dimension n (age, income) andsub-lists i, where n=2 and i=4 (for both dimensions). Initially thecursor 63 for list 62 points to a location where the first sorted ID(value Customer ID “A” in this example) is stored in the master list.The cursor 67 a at list 64 a in this example also points to a locationwhere the value Customer ID “A” is stored in the sub-list 64 arepresenting those customers that have an age that falls in agesublevel 1. The cursor 67 b at list 64 b in this example points to alocation where the value Customer ID “B” is stored and so forth.

The process 46 iterates over the lists in the first dimension to findthe Customer ID “A” by reading 46 b the entry at the top of a first listcomparing 46 c it to the current value in the master list andincrementing the index of the list being examined 46 d until the valueCustomer ID “A” is found. Finding that occurrence ends the loop if thelists are mutually exclusive, otherwise, an indication of a match isstored and the value of i is incremented to check the remaining lists.The process 46 stores the indication that list 64 a had the value ofCustomer ID “A” and increments 46 f the value “n” to find the occurrenceof A in the second dimension, e.g., sub-lists 66 a-66 d corresponding toincome. The process loops through those lists till it finds Customer ID“A” in sub-list 66 b. Finding of Customer ID A in both dimensions is anintersection of those two dimensions (Age and Income) so that the cell(1,b) in the two dimensional array 80, in the simplest case, isincremented 46 h to have a value of “1” indicating that there was aintersection between income sublevel b and age sublevel 1. Invariations, computations other than count can be calculated (e.g., min,max, average, sum, etc. of some other attribute or field).

After the Customer ID “A” is found in all dimensions (here two) thecursors for the sub-lists (here sub-lists 64 a and 66 b) where A wasfound are incremented 46 i. The cursor 63 is also incremented 46 j forthe customer list 62 to Customer ID “B” and the process repeats untilall entries in the master list 62 have been used.

The merging process 46 scans down the lists by incrementing the cursorswhen merging 46 finds intersections of age and income. The intersectionsare used to populate the two-dimensional array 80 (FIG. 6). Thesingle-pass scanning process can be visualized as popping each entry offof the list, analogous to incrementing pointers and popping entries offof stacks. In the lists 62, 64 a-64 d and 66 a-66 d, the entries areguaranteed to be in order because the entries are sorted. The lists aresorted alphabetically if the values are text strings or numerically ifthey are numbers. Any sort order can be used as long as the sortcriteria are consistent across the master list and all sub-lists.

The process 46 calculates the values for each cell in the structure 80,which could be simple counts. The process 46 scans all the lists in onepass. The process goes down the master list 62 of CIDs and looks for avalue of that CID in sub-lists 64 a-64 d and 66 a-66 d. When the processfinds the value of the CID for all dimensions of data in the sub-lists64 a-64 d and 66 a-66 d, the process performs the required calculations(e.g., adds the occurrence to the value already in the cell forcomputing simple counts) in the cross-table and increments only thosecursors of cursors 67 a-67 d and 69 a-69 d of the sub-lists where thevalues were found. Thus, the initial sorting of the results of the queryallows the cross-tabulation structure to be constructed from a singlelinear pass through the sub-lists 64 a-64 d and 66 a-66 d.

If the sub-lists of a dimension are mutually exclusive (i.e., thesub-lists do not have common members and the queries used to from thesub-lists had disjoint boundaries), once the process 46 finds the CID ina sub-list of a dimension, the process 46 no longer needs to searchthrough the other sub-lists for that dimension, as is indicated in 46 fof FIG. 7. If the sub-lists of a dimension are not mutually exclusive,(i.e., the sub-lists may have common members and the queries used toform the sub-lists had overlapping boundaries), then once the processfinds the CID in one sub-list of a dimension, the process still scansthe remaining sub-lists of that dimension for additional occurrences ofthat value of CID.

The process 40 allows the user to specify the dimensions and the raw SQLstatements, thus allowing the user to specify 2 dimensions, 3dimensions, and so forth. The process 40 executes the sets of queriesfor each specified dimension only once, while the construction of eachstructure is accomplished by a single-pass matching/merging process ofthe sorted ID lists.

The process 40 performs pre-aggregation of data for fastdisplay/drilling by computing a structure quickly after some initialsorting operations. The process 40 can work over multiple dimensions ofdata (e.g., age, income), where it is need to aggregate data overmultiple dimensions (2 or more) for analysis avoiding an exponentialgrowth problem situation. The algorithm performs a very efficient 1-passthrough the data.

The process 40 allows the user to specify the dimensions in a querystatement, thus allowing the user to specify 2 dimensions, 3 dimensions,and so forth. The process 40 executes sets of queries for each specifieddimension only once, while constructing a structure by performingmatching/merging processes on sorted ID lists, e.g., 64 a-64 d and 66a-66 d. The process 40 performs pre-aggregation of data for fastdisplay/drill-down by computing structure 80 quickly. The process canwork over multiple dimensions of data, where it is needed to aggregatedata over multiple dimensions for analysis while avoiding an exponentialgrowth situation. The algorithm performs a very efficient 1-pass throughthe data.

The process 40 provides a number of performance improvements overcompeting processes. For instance, speed of calculations is based on sumof the number of bins over all dimensions, though if multiplehierarchical levels of a dimension can be rolled up from lower levels,only queries for the lowest level of granularity need to be executed,further increasing the computational efficiency. For a single-dimensionquery, the computation is of order (n log n), where n is the number ofrows of data being processed, assuming that the data is not sorted. Forcalculating multiple dimensions, the calculation is of the order of nlog n times m, where m is the number of levels across all dimensions (orthe number at the most granular levels across all dimensions if thehierarchy can be rolled up from lower levels). If the data is returnedfrom a database with the fields already sorted, the calculationcomplexity is of the order (n×m). This approach can be 10 to 100 timesfaster than competing approaches which have calculations on the order ofn*the number of complex queries=f (number of dimensions and levels).

Furthermore, the process 40 simplifies the queries that are required tobe executed by the database 32. Two queries of the form “Field1=X” and“Field2=Y” are computationally more efficient to execute than a singlequery of the form “Field1=X AND Field2=Y”. Not only does thecross-tabulation process 40 reduce the number of queries required from ageometric progression to a linear one, it also reduces the complexity ofthe queries to be executed. This adds to the performance advantage ofthis approach.

The process 40 can be used with more than two dimensions, e.g., adding a3rd dimension (age, income, geography) to the example, requires 12queries (assuming each dimension has 4 sublevels) to handle 64 totalcells. The number of required queries to execute the cross-tabulation 40increases linearly (n+m+ . . . +x), where n, m, . . . , x represent thenumber of sublevels in each dimension, while analysis increasesgeometrically (n m* . . . *x).

Referring to FIG. 8, another example of a cross-tabulation structure 90here having three dimensions is shown. In FIG. 8, the third dimension(e.g., territory) is added to the query to produce data from thedatabase. The number of cells thus increases by a factor of the numberof levels of “territory.” The table in FIG. 8 has age, income, andterritory dimensions (each with 4 sublevels, and hence 64 cells in thestructure 90). The territory dimensions are denoted as W, X, Y and Z.The number of queries that the process generates is 3×4=12 plus 1 queryfor the master list for a total of 13 queries. With the 13 queries, theprocess can handle 64 cells of accumulation (4*4*4).

Computing efficiency for an increased number of dimensions is related tothe number of levels in each dimension. For example, assume that alongthe age dimension is a top Level “All”, sublevels “Young|Middle|Old”,and each of the sublevels are broken down into further sub-sublevels“16-21, 22-25, 26-30”, “31-35, 36-40, 41-50”, and “51-60, 61-70, 71+.”Thus, there is one level ALL, there are sublevels YOUNG, MID and OLD,and underneath the sublevels there are 9 additional sub-sublevels ofnumerical age groupings. In this situation, if a user wanted tocompletely compute the cross-product through all of the levels and beable to determine how many people are young what income at specifiedlevel, there would be a larger number of cells in the cube.

When upper levels can be easily computed from lower levels (i.e., theboundaries of lower levels roll up cleaning into upper levels), thenumber of queries that the process would issue would be equal to the sumof numbers of the lowest level per dimension. So the number ofcomputations is equal to the sum across all dimensions over the numberof bins in the lowest dimension.

If the bins overlap then the number of queries is equal to not justnumber of bins in the lowest level, but the number of bins overall. Inthis case the number of queries would be 9 queries for thesub-sublevels, plus 3 queries for the sublevels for a total of 12queries to generate the sublists for the AGE dimension. If the problemalso now has 12 income dimensions, there are 144 cross intersections,but the process only has to issue 25 queries (12 for each dimension plusone query to generate the master list) to get the 144cross-intersections. The more complex the levels are in a singledimension (both in depth as well as in the number of bins/granularity)and the larger are the number of dimensions, the higher the number ofcomputations that are required.

Another feature of the technique is that the analysis can be easilyperformed over groups of cells. Assume that there are 50 groups of cells(which can be disjoint or overlapping) for which age and incomecomputations are desired. Issuing queries would provide 50 lists of Idsfor which 50 different cross tabulations would be computed. Thus, ifthere are 50 groups of cells for an age/income analysis, the processwould combine (e.g., “OR”) all of the IDs into a single long masterlist, which is sorted and deduped (duplicates removed). Thereafter theprocess is similar to working on a single cell, except that indexes arealso kept in each of the original 50 ID lists to determine which of the50 cross-tabs are incremented as IDs are processed from the master list.The process produces one cross-tabulation table (n×n structure) to holdthe count for each of the groups. The process scans down the masterlist, each of the 50 segment lists, and the dimension sub-lists in thesingle pass and aggregates values to the appropriate cross-tabulationcells.

Other embodiments are possible for the computation of multiple segmentsfor the same dimension. For instance, lists for each bin in eachdimension can be periodically pre-computed for the entire population.Once these lists are generated, the process can use the arbitrarysegments of population and compare them against the segmented list ofcustomer IDs to find intersections. That is, no matter how many segmentsthere are, the process does not need to issue any queries to get thelists for the dimension bins. This allows the process to generate cubesvery fast for any segment without issuing any query for counts (and onlyissuing one query to get the fields that are needed to accumulate orprocess the cells of the cubes).

The process can be expanded to perform other functions on the datarepresented in the database. Thus, in addition to summing, the processcan provide average counts, minimum counts, maximum counts, a standarddeviation of another variable (e.g., sum of account balances, averagedtenure), and so forth. The additional variable(s) are brought back aspart of the master list and are referenced for the required computations(rather than bringing the variable back with each sub-list). The processcan also compute non-intersection of cells.

A number of embodiments of the invention have been described.Nevertheless, it will be understood that various modifications may bemade without departing from the spirit and scope of the invention. Forexample, age, income and territory are examples of 3 attributes orcustomer characteristic. Other characteristics could be used asdimensions for instance, recency of purchase, frequency of purchase andan aggregate of amount of purchases so called RFM characteristics.Accordingly, other embodiments are within the scope of the followingclaims.

1-48. (canceled)
 49. A computer-implemented method of mining a database having records with each record including a record identifier and a plurality of dimensions, comprising: issuing, to the database, a plurality of queries for multiple sublevels of data for a first dimension; issuing, to the database, a plurality of queries for multiple sublevels of data for a second dimension; receiving, from the database and for each one of the queries, a sub-list of record identifiers; and identifying, by traversing the received sub-lists, an occurrence of an intersection of a sublevel of the first dimension with a sublevel of the second dimension.
 50. The method of claim 49, wherein a cross-tabulation structure is generated using the received sublists.
 51. The method of claim 50, wherein the cross-tabulation structure is a multiple-dimensional array having a structure dimensioned with the number of sublevels for the first dimension and the number of sublevels for the second dimension.
 52. The method of claim 51, wherein the identified occurrence of the intersection is used to populate the array, and each cell in the array is populated based upon a total number of intersections for the sublevel in the first dimension and sublevel in the second dimension associated with the cell.
 53. The method of claim 49, wherein the occurrence is determined based upon a record identifier being found in a sublevel of the first dimension and a sublevel of the second dimension.
 54. The method of claim 49, wherein the plurality of dimensions includes three or more dimensions.
 55. A computer program product, comprising: a computer readable storage device having stored therein instructions for mining a database having records with each record including a record identifier and a plurality of dimensions, the instructions, which when executed on a computer hardware system, cause the computer hardware system to perform: issuing, to the database, a plurality of queries for multiple sublevels of data for a first dimension; issuing, to the database, a plurality of queries for multiple sublevels of data for a second dimension; receiving, from the database and for each one of the queries, a sub-list of record identifiers; and identifying, by traversing the received sub-lists, an occurrence of an intersection of a sublevel of the first dimension with a sublevel of the second dimension.
 56. The computer program product of claim 55, wherein a cross-tabulation structure is generated using the received sublists.
 57. The computer program product of claim 56, wherein the cross-tabulation structure is a multiple-dimensional array having a structure dimensioned with the number of sublevels for the first dimension and the number of sublevels for the second dimension.
 58. The computer program product of claim 57, wherein the identified occurrence of the intersection is used to populate the array, and each cell in the array is populated based upon a total number of intersections for the sublevel in the first dimension and sublevel in the second dimension associated with the cell.
 59. The computer program product of claim 55, wherein the occurrence is determined based upon a record identifier being found in a sublevel of the first dimension and a sublevel of the second dimension.
 60. The computer program product of claim 55, wherein the plurality of dimensions includes three or more dimensions.
 61. A computer hardware system configured to mine a database having records with each record including a record identifier and a plurality of dimensions, comprising at least one hardware processor configured to initiate the following operations: issuing, to the database, a plurality of queries for multiple sublevels of data for a first dimension; issuing, to the database, a plurality of queries for multiple sublevels of data for a second dimension; receiving, from the database and for each one of the queries, a sub-list of record identifiers; and identifying, by traversing the received sub-lists, an occurrence of an intersection of a sublevel of the first dimension with a sublevel of the second dimension.
 62. The system of claim 61, wherein a cross-tabulation structure is generated using the received sublists.
 63. The system of claim 62, wherein the cross-tabulation structure is a multiple-dimensional array having a structure dimensioned with the number of sublevels for the first dimension and the number of sublevels for the second dimension.
 64. The system of claim 63, wherein the identified occurrence of the intersection is used to populate the array, and each cell in the array is populated based upon a total number of intersections for the sublevel in the first dimension and sublevel in the second dimension associated with the cell.
 65. The system of claim 61, wherein the occurrence is determined based upon a record identifier being found in a sublevel of the first dimension and a sublevel of the second dimension.
 66. The system of claim 61, wherein the plurality of dimensions includes three or more dimensions. 